Since scientific papers are usually semi - structural documents , a hierarchy classification model based on the metadata of scientific papers is proposed , where the metadata include the titles , keyword sets , abstracts and so on 摘要針對(duì)科技論文具有半結(jié)構(gòu)化的特點(diǎn),提出利用科技論文的元數(shù)據(jù)的多層次分類模型。
At the same time , because general hierarchical is not good on question classification , this paper proposes a new method for chinese question hierarchical classification . this method combines the key class features with the question syntactic features to classify questions . since this method extracts the syntax features and adds syntax information into question classification , at last , the precision of the coarse classes reaches 88 . 25 % and fine classes reaches 73 . 15 % , respectively improves nearly ten percent than the traditional hierarchy classification , proving this method is effective 本文針對(duì)文本分類和問題分類的差別,利用依存分析提取主干和疑問詞及其附屬成分,并結(jié)合主干關(guān)聯(lián)詞對(duì),采用支持向量機(jī)分類器,此方法大大減少了問題分類的噪音,突出了問題分類的主要特征,并考慮了詞與詞之間的句法關(guān)系,取得了良好效果;同時(shí),針對(duì)普通層次分類在問題分類上效果不理想的情況,本文提出了類別主特征結(jié)合句法特征的中文問題層次分類新思想,利用句法分析提取分類特征,在問題分類中融入了句法信息,總的準(zhǔn)確率達(dá)到大類88 . 25 %和小類73 . 15 % ,比傳統(tǒng)的層次分類分別提高了10個(gè)百分點(diǎn),證明了此方法的有效性。